Correcting for Cryptic Relatedness in Genome-Wide Association Studies
نویسندگان
چکیده
While the individuals chosen for a genome-wide association study (GWAS) may not be closely related to each other, there can be distant (cryptic) relationships that confound the evidence of disease association. These cryptic relationships violate the GWAS assumption regarding the independence of the subjects’ genomes, and failure to account for these relationships results in both false positives and false negatives. This paper presents a method to correct for these cryptic relationships. We accurately detect distant relationships using an expectation maximization (EM) algorithm for finding the identity coefficients from genotype data with know prior knowledge of relationships. From the identity coefficients we compute the kinship coefficients and employ a kinshipcorrected association test. We assess the accuracy of our EM kinship estimation algorithm. We show that on genomes simulated from a Wright-Fisher pedigree, our method converges quickly and requires only a relatively small number of sites to be accurate. We also demonstrate that our kinship coefficient estimates outperform state-of-the-art covariance-based approaches and PLINK’s kinship estimate. To assess the kinship-corrected association test, we simulated individuals from deep pedigrees and drew one site to recessively determine the disease status. We applied our EM algorithm to estimate the kinship coefficients and ran a kinship-adjusted association test. Our approach compares favorably with the state-of-the-art and far out-performs a näıve association test. We advocate use of our method to detect cryptic relationships and for correcting association tests. Not only is our model easy to interpret due to the use of identity states as latent variables, but also inference provides state-of-the-art accuracy.
منابع مشابه
Estimating inflation in GWAS summary statistics due to variance distortion from cryptic relatedness
Cryptic relatedness is inherently a feature of large genome-wide association studies (GWAS), and can give rise to considerable inflation in summary statistics for single nucleotide polymorphism (SNP) associations with phenotypes. It has proven difficult to disentangle these inflationary effects from true polygenic effects. Here we present results of a model that enables estimation of polygenici...
متن کاملCorrecting for cryptic relatedness in population-based association studies of continuous traits.
Cryptic relatedness was suggested to be an important source of confounding in population-based association studies (PBAS). The magnitude and manner of cryptic relatedness affecting the performance of PBAS of continuous traits remain to be investigated. We simulated a set of related samples through biased sampling and inbreeding, and evaluated the power and type I error rates of simple associati...
متن کاملPREST-plus identifies pedigree errors and cryptic relatedness in the GAW18 sample using genome-wide SNP data
Pedigree errors and cryptic relatedness often appear in families or population samples collected for genetic studies. If not identified, these issues can lead to either increased false negatives or false positives in both linkage and association analyses. To identify pedigree errors and cryptic relatedness among individuals from the 20 San Antonio Family Studies (SAFS) families and cryptic rela...
متن کاملGenetic Studies: The Linear Mixed Models in Genome-wide Association Studies
With the availability of high-density genomic data containing millions of single nucleotide polymorphisms and tens or hundreds of thousands of individuals, genetic association study is likely to identify the variants contributing to complex traits in a genome-wide scale. However, genome-wide association studies are confounded by some spurious associations due to not properly interpreting sample...
متن کاملUsing Dimension Reduction Techniques to Model Genetic Relationships for Association Studies Thesis Proposal for
Beyond a few degrees of relationship pedigrees are rarely known with absolute certainty. This uncertainty is often elevated in population isolates, in which all extant individuals trace their ancestry to a limited number of founders. Cryptic relatedness can have a detrimental impact on nominal false positive rates for genetic association tests. An algorithm overcoming this problem is as follows...
متن کامل